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Abstract 

Let Fk(n, m) be a random fc-CNF formula formed by selecting uniformly and independently m out of 
all possible fc-clauses on n variables. It is well-known that if r > 2 log 2, then Fk(n,rn) is unsatisfiable 
with probability that tends to 1 as n — » do. We prove that if r < 2 fc log 2 — tk, where tk = O(k), then 
Fk(n,rn) is satisfiable with probability that tends to 1 as n — > oo. 

Our technique, in fact, yields an explicit lower bound for the random fc-SAT threshold for every k. 
For k > 4 our bounds improve all previously known such bounds. 

1 Introduction 

Call a disjunction of k Boolean variables a fc-clause. For a set V of n Boolean variables, let Ck(V) denote the 
set of all 2 k n k possible fc-clauses on V. A random fc-CNF formula Fk(n, m) is formed by selecting uniformly, 
independently and with replacement m clauses from Ck and taking their conjunction^. The study of such 
random fc-CNF formulas has attracted substantial interest in logic, optimization, combinatorics, theory of 
algorithms and, more recently, statistical physics. 

Say that a sequence of events £„ occurs with high probability (w.h.p.) if lmin^oo P[£„] = 1 and with 
uniformly positive probability if liminf„_, 00 P[£„] > 0. We emphasize that throughout the paper k is 
arbitrarily large but fixed, while n — > oo. For each k > 2, let 

rfe = sup{r : Fk(n,rn) is satisfiable w.h.p.} , 
r*. = inf{r : Fk(n,rn) is unsatisfiable w.h.p.} . 

Clearly, r& < r^. The Satisfiability Threshold Conjecture asserts that = for all k > 3. Our main 
result establishes an asymptotic form of this conjecture. 

Theorem 1 As k — > oo, 

r* = rj£(l - o(l)) . 

As we will see in Section fl.ll a classical and very simple argument gives r^ < 2 fc log2. The following 
theorem implies that this bound is asymptotically tight. The theorem also sharpens the o(l) term in 
Theorem [3 
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tQur results hold in all common models for random fc-SAT, e.g., when clause replacement is not allowed. See Section 1^1 



Theorem 2 There exists a sequence 5 k — > such that for all k > 3, 

r fe >2 fc log2-(fc+l)^-l-4 . 

Theorem[2]cstablishcs that r k ~ 2 k log 2, in agreement with the predictions of Monasson and Zccchina [23 
based on the "replica method" of statistical mechanics. Like most arguments based on the replica method, 
the approach in [23] is mathematically sophisticated but far from rigorous. To the best of our knowledge, 
our result is the first rigorous proof of a replica method prediction for any NP-complete problem at zero 
temperature. 

Obtaining tight bounds for the thresholds r k is a benchmark problem for a number of analytic and 
combinatorial techniques of wider applicability |1 111141 IT71 IS"]. The best bounds prior to our work for general 
k, from £Q and [Hj respectively, differed roughly by a factor of 2: 

2* -1 log 2 - 6(1) <r k < r* k < 2 fc log2 - 6(1) . 

Traditionally, lower bounds for r k have been established by analyzing algorithms for finding satisfying 
assignments, i.e., by proving in each case that some specific algorithm succeeds w.h.p. on Fk(n,rn) for r 
smaller than a certain value. Indeed, until very recently, all lower bounds for r k were algorithmic and of the 
form tt(2 k /k). The bound r k > 2 k ~ 1 log 2 - 6(1) from Q], derived via a non-algorithmic argument, was the 
first to break the 2 k jk barrier. 

Our proof of Theorem [3 is also non-algorithmic, based instead on a delicate application of the second 
moment method. By not going after some particular satisfying truth assignment, as algorithms do, our 
arguments offer some glimpses of the "geometry" of the set of satisfying truth assignments. Also, the 
proof yields an explicit lower bound for r k for each k > 3. Already for k > 4 this improves all previously 
known lower bounds for r k . Below, we compare our lower bound with the best known algorithmic lower 
bound ^JE| an d the best known upper bound ^J[!5JE1 for some small values of k. 
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1.1 Background 

Franco and Paull ^3J, m the early 80's, observed that r£ < 2 fc log 2. To see this, fix any truth assignment and 
observe that a random fc-clause is satisfied by it with probability 1 — 2~ fe . Therefore, the expected number 
of satisfying truth assignments of F k (n,rn) is [2(1 — 2~ k ) r ] n = o(l) for r > 2 fe log2. In 1990, Chao and 
Franco [3] complemented this by proving that for r < 2 k /k a simple algorithm, called Unit Clause (uc), 
finds a satisfying truth assignment with uniformly positive probability. 

At around the same time, experimental results by Cheeseman, Kanefsky and Taylor E 4, and Mitchell, 
Selman and Levesque [22] suggested that random fc-SAT, while a logical model, also behaves like a physical 
system in the sense that it appears to undergo a phase transition. Perhaps the first statement of the 
satisfiability threshold conjecture appeared about ten years ago in the work of Chvatal and Reed [5] who 
proved r2 = r\ = 1 and, by analyzing an extension of UC, established that r k > (3/8)2 fc /fc. A few years 
later, Frieze and Suen |15| improved this lower bound to r k > Cfc2 fc /fc, where lim^oo c k = 1.817 . . ., and this 
remained the best bound for r k until recently. 

In a breakthrough paper, Friedgut |14j proved the existence of a non-uniform threshold. 

Theorem 3 (Friedgut |14j ) For each k>2, there exists a sequence r k (n) such that for every e > 0. 



lim P[_Ffc(n, rn) is satisfiable] = 



if r = (1 - e)r k (n) 
ifr = (l + e)r k (n) 
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As mentioned earlier, in [Q, Moore and the first author established > 2 k ~ 1 log 2 — 1. Independently, 
Frieze and Wormald |16| proved that if k is allowed to grow with n, in particular if k — log 2 n — > +00, then 
random fc-SAT has a sharp threshold around m = n(2 k + 0(1)) log 2. See PP for further background. 

The rest of the paper is organized as follows. In the next section we recall the argument in [P, highlight 
its main weakness and discuss how we overcome it. Our main idea can be implemented cither by a simple 
weighting scheme or by a more refined large deviations argument. Both approaches yield 2 fe log2 as the 
leading term in the lower bound for ru- The weighting scheme argument is more compact and technically 
simpler. However, it gives away a factor of four in the 0(fc) second order term. The large deviations 
analysis, on the other hand, is tight for our method, up to an additive 0(1). We present the weighting 
scheme argument in Sections 0—1111 The additional material for the large deviations argument appears in 
Sections [7J— El In Sectional we describe our derivation of explicit lower bounds for small values of k. We 
conclude with some discussion and open problems in Section fTTI 



2 Outline and heuristics 

For any non-negative random variable X one can get a lower bound onP[X > 0] by the following inequality. 



Lemma 1 For any non-negative random variable X , 

P[^>o]>|K. (1) 

In particular, if X denotes the number of satisfying assignments of a random formula Fk(n, rn), one can 
get a lower bound on the probability of satisfiability by applying Q to X. We will refer to this approach 
as the "vanilla" application of the second moment method. Indeed, the following immediate corollary of 
Theorem |3 implies that if P[X > 0] > I/O for any constant O > 0, then > 7-. 

Corollary 1 Fix k > 2. If Fk(n,rn) is satisfiable with uniformly positive probability then r^ > r. 

Thus, if for a given r we have E[A 2 ] = 0(E[X] 2 ), then ru > r. Unfortunately, as we will see, this is never 
the case: for every constant r > 0, there exists (3 = /3(r) > such that E[A" 2 ] > (1 + 0) n E[A"] 2 . 



2.1 The vanilla second moment method fails 



Given a CNF formula F on n variables let S(F) = {a : a satisfies F} C {0, 1}™ denote the set of satisfying 
truth assignments of F and let X = X(F) = \S(F)\. Then, for a fc-CNF formula with independent clauses 



ci, c 2 



, . . . , u m 



E[A 2 ] = E 



<reS(F) 



E 



,TGS( Ci 



^n^w^)] ■ ( 2 ) 



We claim that E[l . )T g5( Ci )], i.e., the probability that a fixed pair of truth assignments a, r satisfy the 
ith random clause, depends only on the number of variables z to which a and r assign the same value. 
Specifically, if the overlap of a and r is z = an, we claim that this probability is 



P[ff,Te%)]=l-2 



l-k 



~ k a k 



fs(a) 



(3) 



This follows by observing that if q is not satisfied by a, the only way for it to also not be satisfied by r is 
for all k variables in Cj to lie in the overlap of a and r. Thus, fs quantifies the correlation between a being 
satisfying and r being satisfying as a function of their overlap. In particular, observe that truth assignments 
with overlap n/2 are uncorrelated since /g(l/2) = (1 — 2~ fe ) 2 = P[<r is satisfying] 2 . 
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Figure 1: k = 5, r = 14, 16, 20 (top to bottom). 
Since the number of ordered pairs of assignments with overlap z is 2™ (™) we see that (0 and © imply 

E{X 2 ] = 2 n ^r( n )f s (z/nr ■ 



z=0 



Writing z = an and using the approximation ("J = (a a (l — a) 1 a ) n x poly(n) we get 

fs(a) r 



E\X 2 } > 2" max 

\0<Q<1 



a a (l - a) 1 -" 
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x poly(n) = ( max As(a) ) x poly(n) 



Note now that E[X] 2 = (2"(1 - 2- fe )™)" = (4/<;(l/2) r )" = A s (l/2)". Therefore, if there exists some 
a e [0, 1] such that As(a) > As(l/2), then the second moment is exponentially greater than the square of 
the expectation and we only get an exponentially small lower bound for P[X > 0]. Put differently, unless 
the dominant contribution to E[JT 2 ] comes from uncorrelated pairs of satisfying assignments, i.e., pairs with 
overlap n/2, the second moment method fails. Indeed, for any constant r > this is precisely what happens 
as the function As is maximized at some a > 1/2. The reason for this is as follows: while the entropic 
factor £(a) = l/(a a (l — is maximized when a = 1/2, the function fs has a positive derivative in 

(0, 1). Therefore, the derivative of Ag is never at 1/2, instead becoming only when the correlation benefit 
balances with the penalty of decreasing entropy at some a > 1/2. 



2.2 Random NAE A;-SAT and balance 

In [P, the second moment method was applied successfully by considering only those satisfying truth as- 
signments whose complement is also satisfying. Observe that this is equivalent to interpreting Fk(n,m) as 
an instance of Not All Equal fc-SAT, where a is a solution iff under a every clause has at least one satisfied 
literal and at least one unsatisfied literal. In particular, if a, r have overlap z = an and c is a random clause 

P[o-,r NAE-satisfy c] = 1 - 2 2 ~ fc + 2 1 - k {a k + (1 - a) k ) = f N (a) . 

The key point is that /at is symmetric around a = 1/2 and, as a result, the product £{a)fN{a) r always has a 
local extremum at 1/2. In pQ it was shown that for r < 2 fc_1 log 2 — 1 this extremum is a global maximum, 
implying that for such r, Fh(n,m) is w.h.p. [NAE-] satisfiablc. It is worth noting that for r > 2 fc ~ 1 log2, 
w.h.p. Fk(n,m) is not NAE-satisfiable, i.e., the second moment method determines the NAE-satisfiability 
threshold within an additive constant. Intuitively, the symmetry of /jv stems from the fact that NAE- 
satisfying assignments come in complementary pairs and, thus, having overlap z with an NAE-satisfying 
assignment a (and n — z with cr) is indistinguishable from having overlap n — z with a (and z with ct). 



4 



The suspicion motivating this work is that the correlations behind the failure of the vanilla second moment 
method are mainly due to the following form of populism: satisfying assignments tend to lean towards the 
majority vote truth assignment. Observe that truth assignments that satisfy many literal occurrences in the 
random formula have significantly greater probability of being satisfying. At the same time, such assignments 
are highly correlated since, in order to satisfy many literal occurrences, they tend to agree with each other 
(and the majority truth assignment) on more than half the variables. 

Note that our suspicion regarding populism is consistent with the success of the second moment method 
for random NAE fc-SAT. In that problem, since we need to have at least one satisfied and at least one 
dissatisfied literal in each clause, leaning towards majority is a disadvantage. As intuition suggests, "middle 
of the road" assignments have the greatest probability of being NAE-satisfying. Alternatively, observe 
that conditioning on a being NAE-satisfying does not increase the expected number of satisfied literal 
occurrences under a, whereas conditioning on a being only satisfying increases this expectation by a factor 
2 k /(2 k — 1) relative to the unconditional expectation km/2. To overcome these correlations, populism must 
be discouraged and the delicacy with which this is done determines the accuracy of the resulting bound. 

An example from a different area, which was another inspiration for our work, is the recent proof of the 
Erdos- Taylor conjecture from 1960 for the simple random walk in the planar square lattice (see ^2]>[Z1 and 
for a popular account [23]). The conjecture was that the number of visits to the most frequently visited 
lattice site in the first n steps of the walk, is asymptotic to (logn) 2 /7r. Erdos and Taylor \F2\ obtained a 
(sharp) upper bound via an easy calculation of the expectation of the number X a of vertices visited at least 
a(logn) 2 times. The lower bound they obtained was four times smaller than the conjectured value. In that 
setting the vanilla second moment method fails, since the events that two vertices u, v are visited frequently 
are highly correlated. The conjecture was proved in [7j by first recognizing the main source of the correlation 
in a certain "populism" (when the random walk spends a long time in the smallest disk containing both u 
and v). Replacing X a by a weighted count that discourages such loitering, confirmed that this was indeed 
the source of excessive correlations as the weighted second moment was successful. 

In a nutshell, our plan is to apply the second moment method to balanced satisfying truth assignments, 
i.e., truth assignments that satisfy, approximately, half of all km literal occurrences. As it turns out, choosing 
a concrete range to represent "approximately half and only counting the satisfying assignments that fall 
within the range leads to analytic difficulties due to the polynomial corrections in certain large deviations 
estimates. Fortunately, these issues can be avoided by i) introducing a scheme that weights satisfying truth 
assignments according to their number of satisfied literal occurrences, and ii) tuning the scheme's control 
parameter so as to concentrate the weight on balanced assignments. 

2.3 Weighted second moments: a transform 

Recall that for a CNF formula F on n variables, S = S(F) C {0,1}™ denotes the set of satisfying truth 
assignments of F. An attractive feature of the second moment method is that we are free to apply it to any 
random variable X = X(F) such that X > implies that <S ^ 0. Sums of the form 

X = 5>(<r,F) 

cr 

clearly have this property if w(a, F) = for a ^ S(F). 

Weighting schemes as above can be viewed as transforms of the original problem and can be particularly 
effective in exploiting insights into the source of correlations. In particular, if w(u, F) has product structure 
over the clauses, then clause-independence allows one to replace expectations of products with products of 
expectations. With this in mind, let us consider random variables of the form 

x = J2U w ^ c ^ > 

a c 

where w is some arbitrary function. (Eventually, we will require that w{o~,c) = if cr falsifies c.) For 
instance, if w(a, c) is the indicator that c is satisfied by cr, then X simply counts the number of satisfying 
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truth assignments. By linearity of expectation and clause-independence we see that for any function w, 

e w = EI1 e [ w ( ct ' c )] ' ( 4 ) 

a c 
cr,r c 

Since we arc interested in random formulas where the literals are drawn uniformly, we will restrict 
attention to functions that are independent of the variable labels. That is, for every truth assignment a and 
every clause c = l\ V •■ • Vife, we require that w(a,c) — w(v), where Uj = +1 if £i is satisfied under a and 
— 1 if li is falsified under a. With that assumption, (J2J and JSJ simplify to 

E[X] = 2 n (E[w(a,c)}) m , (6) 
E[A 2 ] = ^(EKa.cJ^r)])'" . (7) 

fJ.T 

Let A = { — 1, +1} . Since literals are drawn uniformly and independently we see that for every <r, 

EM<r,c)] = J2 w ( v ) 2 ~ k ■ 

Similarly, for every pair of truth assignments a, r with overlap z = an, 

k 



E[w(a,c)w(a,r)] = ^ «i(u) ) »(v)2-' ! [|(a 1 -.=".(l-«) 1 ^"-) 

u,v£A i=l 

= E w(u)w(v) $ u , v (a) 



= U(a) . (8) 

In particular, observe that E[«;(er, c)] 2 = f w (l/2), i.e., for every function w the weights assigned to truth 
assignments with overlap n/2 are independent. 

Recalling the approximation (") = (a a (l — a) 1 ~ a )~ n x poly(n) wc sec that (0,© imply 



z=0 

< 2" 



E[A 2 ] = 2"E(J/ w (z/nr (9) 

/™(a) r 



max 

0<Q<1 



a a (l - a) 1 



x poly(n) 



max A w (a) I x poly(n) . (10) 

0<a<l J 

Observe that A w (l/2) n = (4/ u ,(l/2) r )™ = E[A] 2 . Moreover, we will see later that a more careful analysis 
of the sum in © allows one to replace the polynomial factor in by O(l). Therefore, if A w (l/2) is the 
global maximum of A™ then E[A 2 ]/E[X] 2 = 0(1) and the second moment method succeeds. 

A necessary condition for A w (l/2) to be a global maximum is that A' w (l/2) = 0. Since A w (a) = 
2£(a)f w (a) r and £'(1/2) = 0, this dictates /4(l/2) = 0. Differentiating f, w we get 



f'Ja) = "(uMv) *u, v (a) [log$ u ,v(a)]' 

u,v£A 

= e *-v(«) E - ^f) 

u,v6A i=l ^ ' 
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In particular, letting u • v denote the inner product of u and v, we see that 
/; u (l/2) = 2- 2fc+1 J2 «>(uMv)u 

u,v6A 

Therefore, for any function w 

/4(i/2) = o 

We can interpret the vanilla application of the first moment method as using a function w = ws which 
assigns to (— 1, . . . , —1) and l/(2 fe — 1) to all other vectors. (It is convenient to always normalize w so that 
Sv u '( v ) = !■) The ^ ac * that W S violates the r.h.s. of 1)120. implies that this attempt must fail. In [P, on the 
other hand, w = wn assigns both to (—1, . . . , —1) and to (+1, . . . , +1) (and l/(2 fc — 2) to all other vectors), 
thus satisfying 112|) and enabling the second moment method. Nevertheless, this particular rebalancing of 
the vectors is rather heavy-handed since it makes it is twice as likely to assign zero to a random clause. 

To achieve better results we would like to choose a function w that is "as close as possible" to ws 
while satisfying (|12|l . That is, we would like w to have minimal relative entropy with respect to u>s 
subject to (|12J) (see Definition 2.15 of |H]). Since ws is constant over all v ^ (— 1,...,— 1) and we 
must have w(— 1, . . . , —1) = t«s(- 1, . . . , —1) = 0, this means that w should have maximum entropy over 
v ^ (—1, . . . , —1) while satisfying l|12|) . So, all in all, we are seeking a maximum-entropy collection of weights 
for the vectors in A such that i) the all -Is vector has weight 0, ii) the weighted vectors cancel out. 

For x G A, let |x| denote the number of +ls in x. By summing the r.h.s. of l|12fl over the coordinates we 
see that a necessary condition for the optimality of w is 

w(v)(2|v| - fc) = . (13) 

v^(-l,...,-l) 

Maximizing entropy subject to l|13|) is a standard Lagrange multipliers problem. Its unique solution is 

«,(v) = |aW , (14) 
where Z is a normalizing constant and A satisfies (1 + A) fc_1 = 1/(1 — A) so that (|13|) is satisfied, i.e., 

J2 ( k ) A J '(2j - fc) - fc (1 - (1 + A) fe -!(1 - A)) = . (15) 

j=i w 

Note now that for w given by 1)14(1 . symmetry ensures that all coordinates of X)v u '( v ) v are equal. Since, 
by Q15|l. the sum over these coordinates vanishes, we see that in fact l|12|) must hold as well. Therefore, w is 
indeed the optimal solution for our original problem. 

We plot below the functions f w and A„, corresponding to this weighting, for the values of fc, r in Figure 1. 
(With a normalization for J2 U w ( u ) which makes the plot scale analogous to that in Figure 1 and which will 
be more convenient for computing f w and A^ in the next section.) 



v = 2 - 2fe+1 K^ (u)u . K> ( v)v 

VuGA / \veA / 



(11) 



^ w(v)v = . 



(12) 



vgA 



In conclusion, if L(a,F) denotes the number of satisfied literal occurrences in F under <r, we will take 

w(a,F) cx n AL(ff ' C) W(c) , (16) 

c 

where (1 + A) fe_1 = 1/(1 — A). The above weighting scheme yields Theorem^] below, which wc will prove 
in Sections 0— El Theorem 21 has the same leading term as Theorem |21 but a linear correction term 4 times 
greater. This lost factor of 4 is due to our insistence that w(a, F) factorizes perfectly over the clauses. In 
Sections EHH1 we go beyond what can be achieved with perfect factorization by performing a truncation. 
This will allow us to prove Theorem [21 which gives a lower bound for that is within an additive constant 
of the upper bound for the existence of balanced satisfying assignments. 
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0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 

a a 
fw (a) A w (a) = 2£{a)f w {a) r 

Figure 2: k = 5, r = 14, 16, 20 (top to bottom). 

Theorem 4 There exists a sequence (3k — > such that for all k > 3, 

r k >2 fe log2-2(fc + l)log2-l-/? fc . 

3 Groundwork 

Given a fc-CNF formula Fonn variables, recall that S(F) is the set of satisfying truth assignments of F. 
Given a G {0,1}" let H = H(a,F) be the number of satisfied literal occurrences in F under a less the 
number of unsatisfied literal occurrences in F under a. For any < 7 < 1, let 

(T 

(Note that 7 ff ( CT ' F ) = ^s(^,F)-km^ gQ this is consistcnt wit h Jjgj f or 7 2 = A.) 

Recall that in Fk(n,m) the m clauses {ci}^Li are i-i-d. random variables, a being the conjunction of k 
i.i.d. random variables {^ij}* = i, each £ij being a uniformly random literal. Clearly, in this model a clause 
may be improper, i.e., it might contain repeated and/or contradictory literals. At the same time, though, 
observe that the probability that a random clause is improper is smaller than k 2 jn and, moreover, the proper 
clauses are uniformly selected among all proper clauses. Therefore w.h.p. the number of improper clauses is 
o(n) implying that if for a given r, Ff.(n, rn) is satisfiable w.h.p. then for m = rn — o(n), the same is true in 
the model where we only select among proper clauses. The issue of selecting clauses without replacement is 
completely analogous as w.h.p. there are o(n) clauses that contain the same k variables as some other clause. 



3.1 The first moment 

For any fixed truth assignment a and a random fc-clause c = £± V • • • V £k, since the literals t\, . . . are 
i.i.d. we have 



E[ 7 " (a ' c) W(c)] - E[ 7 ^' c )] - E[ 7 - fc l^ 5(c) ] 



E 



(27)- 



7 + 7~ 



"(2 7 ) 



-fc 



^(7) 
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Thus, since the m = rn clauses c±, c%, . . . , c m are i.i.d. 



E[X] 



E 



Il7 %Ci) W 



eim 

(2^(7) r )" 



7 H(CT ' Ci) l ffG 5( Ci 



(17) 



3.2 The second moment 

Let a, t be any pair of truth assignments that agree on z = an variables. If l\, l-x, ■ ■ .lh are i.i.d. uniformly 
random literals and c = t\ V I2 V ■ ■ • V iu then 



E 



H(ali)+H(T,li) 



7 2 +7 -2 



1 -a 



E 



H(a,ti)+H(T,li)-, 



IS(c) 



2- k (a 7 - 2 + (1 - a)) , 



E 



7flWi ) +flW( ) lffrfS(c) = 2 - fe (a 7 - 2 ) 



Since ^1 , £2 



are i.i.d., writing 7 2 = 1 — e, we have 



E 



.H'(<t,c)+.H'(t,c), 



l<r,re5(c) = E | 7 H ( '> C )+ H (' r > c ) (l - 1 CT S ( C ) - 1 T £5( C ) + l« T)T 0S(c)) 



E 



[ 72+7 M +l-a) -2 1 - fe (a 7 - 2 + (l-a)) fc + 2- fc (« 7 - 2 ) fc 



(2 - 2e + ae 2 ) k - 2(1 - e + aef + a* 



2 fc (l -e) fc 



/(«) 



2 fe (l -e) fc ' 



(18) 
(19) 



where the dependence of / on e = 1 — 7 2 is implicit. (Taking s = 1 — A in l|18|) yields the /„, of Figure 2.) 
Thus, for a random fc-CNF formula whose m — rn clauses c\, C2, . . . , c m are constructed independently 



E[X 2 } = E 



eS(F) 



E E 



eiw 



^)+^.c)l ffiT6S(c|) 



(20) 
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Since the number of ordered pairs of assignments with overlap z is 2™ (™) and since the m = rn clauses 
are identically distributed, (|20|l and H19|) imply 

Observe now that for, any fixed value of e, f r is a real, positive and twice-differentiable. Thus, to bound 
the sum in l|21|) we can use the following lemma of PQ| . The idea is that sums of these type are dominated by 
the contribution of 0(n 1 ^ 2 ) terms around the maximum term and the proof follows by applying the Laplace 
method of asymptotic analysis [H] ■ 

Lemma 2 Let cj) be any real, positive, twice-differentiable function on [0, 1] and let 

Sn=it( n )Hz/nr ■ 

z=0 W 

Letting =1, define g on [0, 1] as 



If there exists a max S (0, 1) such that g(a max ) = g max > g(a) for all a ^ a max , and g"(a max ) < 0, then 
there exist constants B, C > such that for all sufficiently large n 

B X ,9max ^ S n < C X <?™ ax . 

With Lemma |21 in mind, let us define 



/(q) 

a a {l - a) 1 



9r(a) = ■ (22) 



Let 

s k = 2 fe log2- 21og2(fc + 1) - 1 -3/fc . 

We will prove that 

Lemma 3 Let e be such that 

e{2-e) k - 1 = 1 . (23) 
For all k > 22, if r < s k then g r (l/2) > g r (a) for all a ± 1/2, and g"(l/2) < 0. 
As a result, for r, fc, e as in Lemma |3 we have 

E ^< Cx ((|^)"- < 24 » 

where C = C(k) is independent of n. Observe now that l|17fl and the fact j 2 = 1 — e imply 

E[X] 2 = [(2V( 7 )T] 2 
/(1/2) 



2 k {\ ~e) k 
2g r {l/2Y 



.(2(1 - e ))* 

Therefore, by (|24|l and i|25|l we see that for r, k, e as in Lemma |3 we have 

E[X 2 ] < C x E[X] 2 . 



(25) 
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By Lemma m this implies PLY > 0] > 1/C and, hence, Lemma along with Corollary pimply Theorem 21 



To prove Lemma|31wc will prove the following three lemmata. The first lemma holds for any e G [0, 1) 
and reduces the proof to the case a > 1/2. The second lemma controls the behavior of / (and thus g r ) 
around a = 1/2 and demands the judicious choice of e specified by We note that this is the only value 
of e for which g r has a local maximum at 1/2, for any r > 0. The third lemma deals with a near 1. That 
case needs to be handled separately because g r has another local maximum in that region. The condition 
r < Sfc aims precisely at keeping the value of g r at this other local maximum smaller than g r (l/2). 

Lemma 4 For all e, x > 0, g r {l/2 + x) > g r (l/2 — x). 

Lemma 5 Let e satisfy For all k > 22, if r < 2 fc log2 then g r (l/2) > g r (a) for all a G (1/2,4/5] and 

(£'(1/2) < 0. 

Lemma 6 Let e satisfy (|23[) . For all k > 22, if r < Sk then g r (l/2) > g r (a) for all a G (4/5, 1]. 

The following bound will be useful. If £ satisfies <|23[) . then 

2 1 - k + k<T k < e < 2 1 ~ k + 3kA- k . (26) 

To prove let q(x) = x - 1/(2 - x) k ^ 1 and observe that for all k > 3, the quantity q(2 1 ^ k + ck4~ k ) is 
negative for c = 1 but positive for c = 3. 

4 Proof of Lemma HI 

Observe that a a (l — a) 1 ^" is symmetric around 1/2 and that r > 0. Therefore, it suffices to prove that 
/(1/2 + x) > /(1/2 - x), for all x > 0. To do this we first note that for all x ^ 0, 

2 fc /(l/2 + a;) = ((2 - e) 2 + 2xe 2 f - 2 (2 - e + 2a-e) fe + (1 + 2x) k 

k 



= ( k ) [( 2 - e) 2(fe_J ' ) (2a;e 2 y - 2(2 - (2xe) j + {2xY 



3=0 
k 



= iLtC)^ (2 - e) 2(fe ^V J - 2(2 - £ ) fc ~V + 1 



3=0 
k 



= E( 5)(2^ [(2- £ ) fc -V-l] 2 . (27) 



Thus, for all .t > 0, 



/(l/2 + x)-/(l/2-x) = 2- fc ^y2^(2- £ ) t -¥-l]V-N 3 ) > . 

5 Proof of Lemma E] 

We will prove that for all k > 22 and r < 2 fc log2, g r is strictly decreasing in (1/2,4/5]. We have 
f'{a) = k [(2 - 2e + ae^f^e 2 - 2(1 - e + aef^e + a^ 1 ] , 



, ( v /(a)- 1 (r/» + /(«)(log(l - a) - log a)) 

7r °0 = ■ 28 

a Q (l — a) 1 c 
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So, /'(1/2) = k2- k+1 ((2 - e) fc - 1 e - l) 2 and since, by we have (2 - e^^e = 1 we get 

g' r {\/2) = /'(1/2) = . (29) 

Since g' r (l/2) = and, by 1|27|) . /(a) > for all a we see that l|28|) implies that to prove that g r is 
decreasing in (1/2,4/5] it suffices to prove that the derivative of 

r/'(a) + /(a)(log(l-a)-loga) (30) 

is negative in (1/2,4/5]. We will actually prove this claim for a £ [1/2,4/5]. Since /'(1/2) = this also 
establishes our claim that g"(l/2) < 0. The derivative of (|3"U|) is 

rf"(a) + f(a)(log(l-a)-loga)-f(a)(- + -^—) . (31) 

\a 1 — a J 

By considering (|27|l . we see that / is non-decreasing in [1/2, 1]. Since log(l — a) < \oga for a £ [1/2, 1), it 
follows that in order to prove that the expression in (|31|l is negative it suffices to show that 

rf"(a)<f(a)(±- + -±-) . 

\a 1 — a J 

Since, by definition, e < 1 it follows that as 2 < 2e implying that we can bound /" as 

f"(a) = k{k-l)((2-2e + ae 2 ) k - 2 e 4 ~2(l-s + ae) k - 2 e 2 +a k - 2 ) 

< k 2 (2 k - 2 e 4 + (4/5) fc ~ 2 ) . (32) 

At the same time, 1/a + 1/(1 - a) > 4 and f(a) > /(1/2) = 2" fc ((2 - e) k - l) 2 . Therefore, if e u is any 
upper bound on e it suffices to establish 

rxk 2 (2 fc ~ 2 4 + (4/5) fc ~ 2 ) < 4 x 2~ fc ((2 - e u ) k - l) 2 . (33) 

Invoking to take e u = 2 1 ~ k + 3kA~ k , it is easy to verify that O holds for k > 22 and r = 2 k log 2. □ 

Corollary 2 For all k>65,ifr< 2 fc log2 then g r (l/2) > g r (a) for all a e (1/2,9/10] and g' 1 !(l/2) < 0. 
Proof. If in H33(l we replace 4/5 with 9/10 and take r = 2 k log 2, then the inequality is valid for all k > 65. 



6 Proof of Lemma 6 

First observe that the inequality g r (l/2) > g r (a) is equivalent to 

/(«: 



/(1/2) 



<2a Q (l-a) i - Q . (34) 



Recall now that, by (|27|l . / is increasing in (1/2,1] implying /(a) — /(1/2) > and that for all x > 0, 
log(l + x) < x. Thus, the logarithm of the left hand side above can be bounded as 



r log 



g v /(i/2) ) 

f( a )-f(i/2y 



< r 



/(1/2) 
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So, if we let h(a) = —a log a — (1 — a) log(l — a) we see that (|34H holds if 

To get a lower bound on /(1/2) we use the upper bound for e from 126() . yielding 

/(1/2) = (2-2e + e 2 /2) fc -2(l-e/2) fe + (l/2) fe 

> (2(l-e)) fc -2 

> 2 fe (l-fce)-2 

> 2 fe (l-fc(2 1 - fe + 3fc4- fe ))-2 

= 2 k - 2/c - 2 - 3fc 2 2~ fc . (35) 

To get an upper bound on f(a) — /(1/2) we let a = 1/2 + x and consider the sum in Ij27|) . By our choice 
of e in l(2"5|) we see that: i) the term corresponding to j = 1 vanishes yielding and ii) for all j > 1, 

< (2 - £) k -h' J < 1 yielding (J37J). That is, 

/(1/2 + x) = /(l/2) + 2- fc ^^)(2^[(2-e) fe -V-l] 2 (36) 

< /(l/2) + 2- fc ^^)(2^ (37) 

= /(l/2) + a fc . (38) 

Therefore, we see that (|34|) holds as long as 

r < hg2-Ma) 
a" 

We start by getting a lower bound for for all a 6 (1/2, 1]. For that, we let y = 1 — a and observe that 
for all < y < 1/2 

-/i(l-y) > log(l -y) +ylogy > -y-y 2 + ylogy (39) 

and 

> (1 + y) k > 1 + ky . 



(l-y)k 
Therefore, 

log2-/i(l-y) 



0(1-1/) 



(1 - 

> (l + fcy)(log2-y(l + 2/-logj/)) . (40) 



Writing y = d/2 k and substituting into H40J) we get that for all 1/2 < a < 1, 

0(a) = 0(l-d2- fe ) 

> (1 + kd2- k ) (log 2 - d2~ fe (1 + d2- k - \og{d2~ k ))) 

d 2 

= log 2 + d(logd- l)2- fe - _ (1 + fc (1 + d2- k - log(d2- fe ))) 

> log 2 - 2- fe - _ (1 + fc (1 + d2- k - \og{d2~ k ))) 
= log 2 - 2- k - (1 - a) 2 (1 + k (2 - a - log(l - a))) 

= b(a) . (41) 
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Since <fr is differentiable, to bound it in (4/5, 1] it suffices to consider its value at 4/5, 1 and wherever 

, aloga - alog(l - a) - fclog2 + kh(a) 
(«) = = . (42) 

We start by observing that for k > 6 

0'(4/5) < . 

At the other end, we see that 

lim 1 f W = -1 , 
as->l log(l — a) 

implying that the derivative of <f> becomes positively infinite as wc approach 1. Therefore, we can limit our 
search to the interior of (4/5, 1) for k > 6. 
By setting 4>' to zero, (|42fl gives 

k log 2 - fc/i(a) 

log(l - a) = log a 

a 

which, since 1/2 < a < 1, implies 

log(l -a) < -fc(log2 - /i(a)) . (43) 

Moreover, since log2 — /i(4/5) > 1/6, we see that (|43|) implies a > 1 — e -fe / 6 for all fc. Note now that if 
a > 1 - e~ cfc for any c > then implies /i(a) < e _cfc (l + e _cfc + cfc). Since a > 1 - e" fe / 6 , we thus get 

h(a) < e- fc /6(i + e -fc/6 + fc / 6 ) < e^ fc /6( 2 + fc/6) = Q(k) . (44) 

Plugging (|44|) into (|43|l . we conclude that 

a> i_ e -Miog2-Q(fc)) _ (45) 

Since for k > 12 we have a£ > 4/5, this means that 4> is decreasing in (4/5, a£] for k > 12. 

Note now that the function b bounding </> from below in 141|) is increasing in [0, 1]. Combined with the 
fact that <f) is decreasing in (4/5, a£] this implies that 6(a£) is a lower bound for <fi in (4/5, 1], i.e., 

0(a) > b(a* k ) 

> log2-2- fe -2/(fc2 fc ) , (46) 

where (gHl holds for all k > 22. Combining gSJ with (|23J), we get that for all k > 22 if 

r < 2 fc log2 - 21og2(fc + 1) - 1 - 2,/k 

then g(l/2) > g(a) for all a ± 1/2. 



7 Further refinement: truncation and weighting 

Given a fc-CNF formula F on n variables, recall that S = S(F) C {0,1}™ is the set of satisfying truth 
assignments of F. Recall also that for a G {0, 1}™, by H(a, F) we denote the number of satisfied literal 
occurrences in F under a less the number of unsatisfied literal occurrences. Let S + = {u £ 5 : H (a, F) > 0}. 
For any < 7 < 1 let 

x = £ i H ^ F) , 

cr£S+ 
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In computing the second moment of X in the previous sections, it becomes clear that one needs to control 
the contribution to ELY 2 ] from pairs of truth assignments with high overlap. Close examination of these 
pairs shows that the dominant contributions come from those pairs amongst them that have fewer than half 
of their literals satisfied. If we compute the second moment of X + instead, these highly correlated pairs are 
avoided. Our argument for this is motivated by Cramer's classical "change of measure" technique in large 
deviation theory. 

Specifically, let Eq < 1 satisfy 

£0 = od^ ■ (47) 

Lemma [7| below asserts that if 7 2 = 1 — £oi where £o is specified by (|47|) . then the first moments of X and 
X + are comparable. 



Lemma 7 If 7 2 = 1 — Eq then as n — » oo 



E[*+] 
E[X] 



1/2 



Let a, t be any pair of truth assignments that agree on z = cm variables. If we write 2 = 1 — £ then 
from (|18|) we have 



E 



lH (a,c)+H (t,c) -, 



(c) 



(2 - 2e + ae 2 ) k - 2(1 - e + ae) k + a k 



2 fe (l -e) k 



f(oc,e) 



(48) 



(Observe that the function f(a,e) in (|48|l above is identical to /(a)(2(l — s))~ k , where f(a) is as in H19fl . 
In the earlier sections, since e was fixed, this dependence on e was suppressed to simplify notation.) 

Thus, if F is a random formula consisting of m = rn independent clauses then for any 6 2 = 1 — e > 7 2 , 



E 



„,H(<t,F)+H(t,F) i 



< E 

< E 



<t,tGS+(F) 



iH(tr,F)+H(T,F)-, 

7 (7.T 



65(F) 



(49) 



The crucial point is that H49|) holds for any e < 1 — 7 2 , allowing us to optimize e with respect to a. In 
particular if 7 2 = 1 — Eq then l|49() implies 



E 



H(a,F)+H(r,F) 1 
1 i ff,T£5 H 



(F) 



< 



inf f{z/n,e) 

e<eo 



Thus, following the derivation of H21(l . we deduce that 



nx 2 +] <2-±f n ) 

z=0 \ ' 



inf f(z/n,£) 

e<£ 



(50) 



Let us define 



Observe that by Lemma [7| and l|25|) . 



g r (a,s) 



f r {a,e) 
a a {l - a) 1 ' 



3E[X+] 2 >E[X] 2 = 5r (l/2,e )" 



(51) 



Assume now that there exists a piecewise-constant function £ such that for some value of r we have 
g r (l/2,£o) > 5r(o!,£(a)) for all a ^ 1/2. Then, by decomposing the sum in (|50() along the pieces of £ and 
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applying LemmaOto each piece, we can conclude that E[Xi] < C x E[X + ] 2 , for some C = C(k). Lemma^ 
and Corollary ^ then imply > r. 
Let 



p k = 2 k log 2 - — |— + 1) - 1 - 50k 3 2~ k 



We will prove 
Lemma 8 Let 



'e if a G [1/10,9/10] 



sq/2 otherwise. 



For all k > 166, if r < pk then g r (l/2,Sa) > g r (a,£,(a)) for all a 1/2, and the second derivative of g r with 
respect to a is negative at a = 1/2. 

To prove Lemma |5] we first observe that since £ is symmetric around 1/2, Lemma 0] implies that we only 
need to consider the case a > 1/2. Also, since £(a) = £o for a S [1/2,9/10], Corollary establishes both 
our claim regarding the second derivative of g r at a = 1/2 and g r (l/2,£ ) > 3r(a, for a € (1/2,9/10]. 

Thus, besides Lemma it suffices to prove that 

Lemma 9 For all k > 166, if r < pk then for all a € (9/10, 1] g r (l/2 7 eo) > ,g r (a,£o/2). 

8 Proof of Lemma [7] 

By linearity of expectation, it suffices to prove that for 7 2 = 1 — £o and every a 

Eb^'^l^W)] 1 



(52) 



Recalling that formulas in our model are sequences of i.i.d. random literals t\, . . . , Ikm, let P(-) denote the 
probability assigned by our distribution to any such sequence, i.e., (2n)~ km . Now, fix any truth assignment 
a and consider an auxiliary distribution P 7 on fc-CNF formulas where the km literals are again i.i.d., but 
where now for each fixed literal I 

P 7 [H{a,£) = l] = — 7 — - . 

7 + 7 1 

Observe that since 7 < 1 this probability is at most 1/2. Thus, 

7-7- 1 7 2 - 1 -s 



Ky[H(<T,t)] = 



7 + 7 -1 7 2 + l 2-e ' 
So, for a random fc-clause c 

E 7 [iJ(cr,c)l CTe5(c) ] = E 7 [ J ff(cr,c)] - E 7 [-fcl CT05(c 



-keo + k f 7 



_1 x k 



Since £0 = 1/(2 — Eo)*" 1 we sec that E 7 [i/(cr, c)l ae s( c )] = 0. 
By literal independence, for any specific clause Co 



2-e V7 + 7 1 
£ o , ( 1 



2-£ 



p ^) = 7 (7+7 :i7 - ( 53 ) 
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Let 2f(7) = E 7 [l . g 5( c )] and ^1(7) = -£(7X7 + 7 . For any clause cq, define 

5 . . _ P 7 (co)l ges(co) _ 7 g(CT - c ° ) P(co)l ggS(co) 

where the second equality follows from (|53|l . Now pick m i.i.d. clauses with the distribution in (|54|) . Any 
fixed formula F will be obtained with probability 



7 g fr.* b)P(F )l geS (Jb) 
^i(7) r 



p7^o) = ^ — 1 o) ■ (55) 



Since E 7 [i?(er, c)] = 0, the central limit theorem yields, 

P 7 [ff(a,F)>0]^i 
as n — > 00. By (|55|l . this is equivalent to 1)52(1. 

9 Proof of Lemma [HI 

Write £1 = £o/2 (to simplify notation). Observe that the inequality <? r (l/2,£o) > g r (a,£i) is equivalent to 



(A 



a,£i, 



< 2a a (l - a) 1 " . (56) 



V/(l/2,eo), 

If h(a) = —a log a — (1 — a) log(l — a) denotes the entropy function, then i|56|) is equivalent to 

log 2 - h(a) 

r < — - — - 

log(l + w) 

where 

/(a,ei)-/(l/2,e ) 
/(l/2,eo) 

We will prove /(a, £1) — /(l/2,£ ) < 2 1_fc . Therefore, recalling that for all x > —1, 

1 11a; 
> 



log(l + x) ~ x 2 12 
we see that holds if 



<^?l 4-^. TO 



log2-/ l («) /(«,£i)-/(l/2,£ ) 2 2 fc /(l/2,£ ) ' 

To get a lower bound on /(1/2, £0) we use the upper bound for £0 from l|26|l . Thus, for all k > 5 

(2-2£ + £2/2) fc -2(l-£ /2) fc + (l/2) fc 



2 fe /(l/2,£o) 



> 



(l-eo) fe 

(2 - 2£ ) fc - 2 

(l-eo) fc 
2 



(l-^o) fc 
> 2 k - 2 - 2(1 + ke ) 

>2 k -2- k2- k+1 . (58) 
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To get an upper bound on the numerator of w we let a = 1/2 + x and consider the sum in (|27|) (recall 
that J37J| holds for all e and that /(a) in l(2T|) is merely 2 fc (l - e) fc /(a,£)). First, we observe that for all 

se [0,1), 



2 k (l-e) k f(a,e) = 2"*=^ . (2a - 1) J [(2 - £ ) fc -V - 1] 



fe 



= T 2 (a, e) + 2" fc ^ H (2a - 1)* [(2 - e) fc ~V - if 

j=2 



3=2 

T 2 (a,e) + a fe -/s2" fc (2a-l)-2" fc . (59) 



Next, we will prove that 



T 2 (a,e 1 ) ((2- £l ) fc -l) 2 , fc(2a-l)((2- £l ) fc - 1 £i-l) ((2 - e ) k l) 

-/(l/2,e ) = 



2 fe (l-£i) fe 4 fc (l-£i) fc 4 fc (l-ei) fc 4 fc (l-e ) fc 

< afc2- 2fc - 1 (l-£ ) _fe_1 ■ (60) 

For this, define Ti(e) = 1 — (2 — e) k ~ 1 e so that Ti(eo) = 0. For e < e we infer that 

< Ti(e) < 1 - (2 - eo)*-^ = 1 - e/e . (61) 



Therefore, the function 



satisfies 



Next, define 



Differentiation gives 



fc(2a-l)T 1 ( £ ) 2 

T2(£) = — (i^w k — 

fc(2a - 1) fc(o-l/2) 
T2(£l) " 4(1-7^ 2(l-e ) fc+1 ' (M) 

T3(£) - ■ 
(2-e) fc -1 , , ^ , /2-e\ fc T^e) 



Since ^Ef i s increasing in e, we deduce using l|61|l that for e < e , 



As (1 — s/eq) de = sq/8, we conclude that 



k }l ^ < jTi fc ^ fc+1 • (63) 



J 8(l-e ) fc+1 - 4(1 -e ) 
Adding the inequalities (|52*|) and (jHSJ, then dividing by 4 fe , yields (f^Of) 



18 



Combining (|59|l and (|60|) and requiring k > 6 for l|()4[) we get 

J (a, ex) - f (1/2, s ) < 2 k (l-e l ) k /(V 2 > e o) 

ak2- 2k - 1 a k - k2- k (2a - 1) - 2- k 
< (1 -e ) fe+1 + 2 k (l- £l ) k 

a k - o*2-*- 1 (4 - ) + 2- fc (fc - 1) 



2 fc (l - £l ) fc 
- ak2- k - 1 (3 - fc2- fe+1 ) + 2- k (k - 1) 



< " ~ (64) 



2 k (l-e 1 ) 
a k - 3ak2- k - 1 + 2~ fc ( fc - 1) + 4~ fc fc 2 
< 2*(l- £l )* ■ (65) 

Observe now that for fc > 3 <|65|) establishes our promised claim /(a,£i) — /(l/2,£o) < 2~ fc+1 . Moreover, 
combining and we get iJSSJ, while the fact a > 9/10 implies (|S7I) 

/(1/2 ' £0) > (l ei )» x 



/(a,ei) -/(l/2,£ ) a fc -3afc2- fc - 1 +2- fc (fc-l) + fc 2 4- fc 

> ^- ei ) fcx a*-3a fc 2-'-7+ 2 - fc(fc -i) -( 2 /3) fc . (67) 

Recall now that for any < a < 1 and < q < a fc , 

— ^ — > 1 + fc(l - a) + g . 
a" — q 

Observe that 3ak2- k ~ 1 - 2" fc (fc - 1) < a k for a > 2/3. Since a > 9/10, we thus have 

1 



> 1 + fc(l - a) + 3ak2~ k ~ 1 - 2~ K (k - 1) 



a k - 3ak2- k - 1 + 2- k (k - 1) 
Thus, by l|57 |l .l(55 |l and we see that holds as long as r < (1 - £i) fe 0(a) - 2 x (2/3) fc where 

0(a) = (log 2 - fc(a)) ^(A + 1) - 3k - l - - ak U k - ) . 

We are thus left to minimize in (9/10, 1]. It will be convenient to define 

B = 2 fc (fc + l)-3fc-i (68) 

C = (69) 

and rewrite 

4>{a) = (log2 -h(a)) (B - aC) . 
Since <p is differentiable its minima can only occur at 9/10, 1 or where 

41 {a) = log f j-r^) ( s _ aC) _ (log2 - h ^ C = ' (70) 

Note now that 

Hm*£> =-(B-O<0 
a-»l log(l - a) 
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and, thus, the derivative of <f> becomes positively infinite as we approach 1. At the same time, 

0'(9/lO) < 2.2B-2.3C 

which is negative for k > 23. Therefore, <\> is minimized in the interior of (9/10, 1] for all k > 23. Setting the 
derivative of <fi to zero gives 

C 

-log(l - a) = (log 2 - h(a)) x — - log a 

B — aC 



k 

l + fc(l-^) + ^HL_ 



= (log 2 - h(a)) x — — ^ ; k+6 - log a . (71) 



By "bootstrapping" we will derive a tightening series of bounds on the solution of l|71|) in a <E (9/10, 1). 
Note first that we have an easy upper bound, 

-log(l-a) <fclog2-loga . (72) 

At the same time, if k > 3 then (k + 6)/(2 fc+1 - 7) < 1, implying 

If we write k(l — a) = D then l(75|) becomes 

-logCl-^^^^Mf^Vlog- (74) 



1-a \D + 2, 

By inspection, if D > 3 the r.h.s. of l|74() is greater than the l.h.s. for all a > 9/10, yielding a contradiction. 
Therefore, fc(l - a) < 3 for all fc > 3. Since log 2 - h(a) > 0.36 for a > 9/10, we see that for k > 3, (f75|) 
implies 

— log(l — a) > 0.07 fc or, cquivalently, (75) 
I- a < e-° mk . (76) 

Observe now that (|76|l implies 

fc(l-a) <ke-° mk , (77) 

and, hence, as k increases the denominator of l|71|l actually approaches 1. 
To bootstrap, we first note that since a > 1/2 we have 

h(a) < -2(1 - a) log(l - a) (78) 

< 2e-° 07fc (fclog2-log0.9) (79) 

< 2ke~ 07k (80) 

where (|79H relies on l|76|) and l|72|) . Moreover, a > 1/2 implies — log a < 2(1 — a) which, by l|76|l implies 
— loga < 2e~°' 07k . Thus, starting with (|71|) . using (|77|l . taking fc > 3 and using (|80|) . and finally using 
1/(1 + x) > 1 — x for all x > we get 

-log(l-a) > 



> 



1 + fce -0.07fe + _ 5 fc+^_ 

fc(log2-2fce" 07fc ) 



l + 2fcc-°- 07fc 

> fc(log2-2fce-° 07fc )(l -2fce-° 07fe ) 

> fclog2-4fc 2 e~ 07fc . (81) 
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For k > 166, 4fc 2 e~ ao7fc < 1. Thus, for such fc, JSTJ implies 1 - a < 3 x 2~ fe . This, in turn, implies 
- loga < 2(1 - a) < 6 x 2~ fc and so, by JTSJ) and we have that for all fc > 166 and a > 9/10 

h(a) < 6 x 2~ fc (fclog2-loga) < 5k2~ k . (82) 

Plugging H82|) into (|71|) to bootstrap again, we get (analogously to the derivation of jSJ ) that 

fc(log2 - 5fc2~ fc ) 



log(l - a) > 



> 



l + 3k2 -k + -k±e_ 
A;(log2-5fc2- fc ) 



1 + 6fc2" fe 

> fc(log2-5fc2~ fe )(l-6/c2~ fc ) 

> fclog2- llfc 2 2" fe . 

Since c x < 1 + 2x for x < 1 and life 2 2~ k < 1 for k > 10, we see that 

1 - a < 2- k + 22k 2 2- 2k . 

Plugging into l|72|l the fact —loga < 6 x 2" k wc get — log(l — a) < fclog2 + 6 x 2~ k . Using that 
c~ x > 1 — x for x > 0, we get the closely matching upper bound, 

1-q>2^-6x 2~ 2fc . 

Thus, we see that for k > 166, <f> is minimized at an a m ; n which is within 5 of 1 — 2~ k , where S — 22 k 2 2~ 2k . 
Let T be the interval [1 - 2~ fc -6,1- 2" fc + 6]. Clearly the minimum of </> is at least 

0(l-2- fc )-(5xmax|(//(a)| . 

Using very crude bounds, it is easy to see from lf7U|) that if a G T then |^'(a)| < 2k2 k . 
Now, since for fc > 1 we have log(l — 2~ fc ) > — 2~ k — 2~ 2k , a simple calculation gives 

0(1 - 2- k ) > 2 fe log2 + ^|^(fc - 1) - 1 - 2fc 2 2~ fc . (83) 



min > 2 k log 2 + ^ (fc - 1) - 1 - 46 fc 3 2~ k 



Therefore, 

■> 9 k lno- 2 -I- - 

Finally, recall that ijSfUl holds as long as r < (1 — £i) fe (/> m m — 2 x (2/3) fe . Using the upper bound for e 
from (1201) we get 

11-.,)'^ > (l-2-'-f)' X ( 2 'lo g 2 + ^-l)-l-^ 

> (l-^-^K^+ifHc*-!,-!-^) 

> 2 Mog2-^( fc + l)-l-f! 

= Pfe • 



10 Bounds for specific values of k 

Recall from our discussion in Section [3] that in order to establish r > it suffices to prove that there exists 
some e G [0, 1) for which the function g r defined in l|22|) . i.e., 

( \ - tW. - ((2-2e + <^ 2 ) fc -2(l-e + a g ) fc + a fc ) r , 8 ,s 
9r[a) ~ ««(l-a)i-« " a«(l-a)!-« ' 1 ' 
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has a unique global maximum at 1/2. Recall also that for any r the only choice of e for which g"(l/2) < 
is the one mandated by l|23|) . Thus, for any fixed k one can get a lower bound for by: i) solving l|23|) . ii) 
substituting the solution to i|84|> . and iii) plotting the resulting function to check whether g r (l/2) > g r {a) 
for all a ^ 1/2. As g r never has more than three local maxima this is very straightforward and yields the 
lower bounds referred to as "simple" lower bounds in Table 1X171 below. 

As mentioned in the Introduction, the simple weighting scheme yielding Thcorcm0|does not yield the best 
possible lower bound afforded by applying the second moment method to balanced satisfying assignments. 
For that, one has to use the significantly more refined argument which we presented in Sections [7J— That 
argument also eventually reduces to proving g r (l/2) > g r (a) for all a ^ 1/2. Now, though, e is allowed to 
depend on a, subject only to e < £q, where Eq is the solution of l|23|l . Naturally, at a = 1/2 one still has 
to take e = Eq so that the derivative of g r vanishes, but for larger a (where the danger is) it turns out that 
decreasing e somewhat helps. The bounds reported in Table [2 in the Introduction (and replicated below as 
the "refined" bounds) are, indeed, the result of such optimization of £ as a function of a. 

Specifically, for k < 5 we considered 10,000 equally spaced values of a G [0, 1] and for each such value 
found £ < eq such that the condition g r {a 1 s) < g r (l/2,£o) holds with a bit of room. (For k > 4 we 
solved <|23|l . defining Eq, numerically to 10 digits of accuracy. For the optimization we exploited convexity 
to speed up the search.) Having determined such values of e, we (implicitly) assigned to every not-chosen 
point in [0, 1] the value of e at the nearest chosen point. Finally, we computed a (crude) upper bound on the 
derivative of g r with respect to a in [0, 1]. This bound on the derivative, along with our room factor, then 
implied that for every point that we did not check, the value of g r was sufficiently close to its value at the 
corresponding chosen point to also be dominated by <? r (l/2,£ ). For k > 5, we only partitioned [0,1] into 
two intervals, namely [1/10,9/10] and its complement. Assigning the values Eo and £o/2, respectively, to all 
the points in each interval yielded the bounds for such k. 



k 


3 


4 


5 


7 


10 


20 


21 




Upper bound 


4.51 


10.23 


21.33 


87.88 


708.94 


726,817 


1,453, 


635 


Refined lower bound 


2.68 


7.91 


18.79 


84.82 


704.94 


726, 809 


1,453, 


626 


Simple lower bound 


2.54 


7.31 


17.61 


82.63 


701.53 


726, 802 


1,453, 


619 



11 Conclusions 

We proved that the random A:-SAT threshold satisfies ru ~ 2 fc log2. In particular, we proved that random 
A:-CNF formulas with density 2 k log 2 — fc(log 2)/2 — 0(1) have exponentially many balanced satisfying truth 
assignments. That is, truth assignments that have at least one satisfied literal in every clause yet, in total, 
satisfy only as many literal occurrences as a random truth assignment. 

Our argument leaves a gap of order 0(fc) with the first moment upper bound 2 fc log2. With respect to 
this gap it is worth pointing out that the best known techniques jHl I19j for improving this upper bound 
only give < 2 fe log2 — bk where bk — ► (1 + log2)/2. At the same time, it is not hard to prove that for 
r = 2 k log 2 — fc(log2)/2, i.e., within an additive constant from our lower bound, w.h.p. there are no satisfying 
truth assignments that satisfy only km/2 + o(fcm) literal occurrences. Thus, any asymptotic improvement 
over our lower bound would mean that tendencies toward the majority assignment become essential as we 
approach the threshold. 

The gap between the upper bound and the best algorithmic lower bound = fl(2 k /k), seems to us 
much more significant (and is certainly much bigger!). The lack of progress in the last ten years suggests 
the possibility that no polynomial time algorithm can improve the lower bound asymptotically. At the same 
time, in a completely different direction, Mezard and Zecchina |21| recently used the non-rigorous cavity 
method of statistical physics to obtain detailed predictions for the satisfiability threshold suggesting that 
rfc = 2 fe log2 — O(l). (See also [201 for an overview.) Insights from this analysis led them to an intriguing 
algorithm called "survey propagation" (described in |21ll2^'l that seems to perform well on random instances 
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of fc-SAT close to the threshold, at least for small k. (Its performance is especially impressive for k = 3.) 
A rigorous analysis of this algorithm is still lacking, though, and it remains unclear whether its success for 
values of r close to the threshold extends to large k. 

The success of the second moment method for balanced satisfying truth assignments suggests that such 
assignments form a "mist" in {0, 1}™ and, as a result, they might be hard to find by algorithms based on 
local updates. Moreover, as k increases the influence exerted by the majority vote assignment becomes less 
and less significant as most literals occur very close to their expected kr/2 times. As a result, the structure 
of the space of solutions may well be different for small k (e.g. k = 3, 4) and for larger k. 

To summarize, the following key questions remain: 

1. Is 2 k log 2 - r k bounded? 

2. Is there an algorithmic threshold A k = o(2 fc ) so that for r > Xk, no polynomial-time algorithm can 
find a satisfying truth assignment for the random formula Fj. (n, rn) with uniformly positive probability? 
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